4D Analytics

Regression Analysis

Last updated: July 10, 2020

Linear regression is a well-established statistical technique where two related data points are analyzed to determine if there is a correlation between them.

For example, if we consider two data points, one representing external temperature (T) and the other representing the electrical load (L) of an air-conditioning system. We can surmise that two parameters are related since we can reasonably expect that the load will increase as the temperature increases.

In this case, T is referred to as the independent variable or indicator, and L is the dependent variable. We can look at a sample data set of these two and plot on a scatter graph with the independent variable on the x-axis and dependent of the y-axis.

Running a linear regression on this data set will create a best fit line through the plot.

Note: This is a single linear regression (i.e., a single line is produced). We will provide the option of doing a dual linear regression. This technique will be described further in the relevant section.

A Note on Linear Regression

When running a regression using R, the engine returns a number of useful outputs based on the data set. Amongst these are standard error values based on a 95% confidence interval. These values are an indication of how well the model fits the data set, but in addition to this, we can use these values when analyzing new data to determine where we have data values that fall out with the range of the model.

The R regression function can produce data sets for confidence intervals around the regression line(s). 4D Analytics will import these data sets for use in further analysis.